We were excited to do our report over this data because it was relatively tidy and had quite a few categorical variables and options for additional columns to graph.
#This will allow us to filter through our data
library(tidyverse)
library(dplyr)
#This will help us plot figures to showcase our findings
library(ggplot2)
#This will help us organize and display our data as necessary
library(knitr)
library(kableExtra)
#This expands our plot uses
library(plotly)
#Scientific Notation Disabled
options(scipen=T)
Import the deaths-due-to-air-pollution data
deaths_df_old <- data.frame(read.csv("death-rates-from-air-pollution.csv"))
glimpse(deaths_df_old)
## Rows: 6,468
## Columns: 7
## $ Entity <chr> "Afghanistan", "Afghan…
## $ Code <chr> "AFG", "AFG", "AFG", "…
## $ Year <int> 1990, 1991, 1992, 1993…
## $ Air.pollution..total...deaths.per.100.000. <dbl> 299.4773, 291.2780, 27…
## $ Indoor.air.pollution..deaths.per.100.000. <dbl> 250.3629, 242.5751, 23…
## $ Outdoor.particulate.matter..deaths.per.100.000. <dbl> 46.44659, 46.03384, 44…
## $ Outdoor.ozone.pollution..deaths.per.100.000. <dbl> 5.616442, 5.603960, 5.…
Fixed: use rename instead of colnames
We are going to rename a few of the columns and glimpse the data
deaths_df<- deaths_df_old %>% rename(country=Entity, acronym=Code, year=Year, total_deaths=Air.pollution..total...deaths.per.100.000., indoor_deaths=Indoor.air.pollution..deaths.per.100.000., outdoor_deaths=Outdoor.particulate.matter..deaths.per.100.000., ozone_deaths=Outdoor.ozone.pollution..deaths.per.100.000.)
glimpse(deaths_df)
## Rows: 6,468
## Columns: 7
## $ country <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanist…
## $ acronym <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG",…
## $ year <int> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1…
## $ total_deaths <dbl> 299.4773, 291.2780, 278.9631, 278.7908, 287.1629, 288.0…
## $ indoor_deaths <dbl> 250.3629, 242.5751, 232.0439, 231.6481, 238.8372, 239.9…
## $ outdoor_deaths <dbl> 46.44659, 46.03384, 44.24377, 44.44015, 45.59433, 45.36…
## $ ozone_deaths <dbl> 5.616442, 5.603960, 5.611822, 5.655266, 5.718922, 5.739…
Variables that interest us here include:
Now, let’s take a look at the population data.
world_pop <- read.csv("population_total_long.csv")
glimpse(world_pop)
## Rows: 12,595
## Columns: 3
## $ Country.Name <chr> "Aruba", "Afghanistan", "Angola", "Albania", "Andorra", "…
## $ Year <int> 1960, 1960, 1960, 1960, 1960, 1960, 1960, 1960, 1960, 196…
## $ Count <int> 54211, 8996973, 5454933, 1608800, 13411, 92418, 20481779,…
To get a general idea of ‘deaths-dataframe’ we made, let’s make a plots to see what’s happening. This is a plot of indoor x outdoor deaths around the world by country.
This is a mess, and so we chose two countries from each continent (a high-population and a low-population country) to graph.
We selected a high population from each continent and used the formula below to determine the low population.
Low population = high population * .10
|
|
First let’s look at a table of the high and low populated countries using the world population data set.
|
|
Next, we are going to see the death count for high and low populated countries using the deaths dataframe.
|
|
Lastly, we will join the population and and deaths with its respected country.
|
|
Combine the data based on continent.
joined_all <- right_join(deaths_df, world_pop, by=c('country' = 'Country.Name', 'year' = 'Year'))
head(joined_all)
## country acronym year total_deaths indoor_deaths outdoor_deaths
## 1 Afghanistan AFG 1990 299.4773 250.3629 46.44659
## 2 Afghanistan AFG 1991 291.2780 242.5751 46.03384
## 3 Afghanistan AFG 1992 278.9631 232.0439 44.24377
## 4 Afghanistan AFG 1993 278.7908 231.6481 44.44015
## 5 Afghanistan AFG 1994 287.1629 238.8372 45.59433
## 6 Afghanistan AFG 1995 288.0142 239.9066 45.36714
## ozone_deaths Count
## 1 5.616442 12412308
## 2 5.603960 13299017
## 3 5.611822 14485546
## 4 5.655266 15816603
## 5 5.718922 17075727
## 6 5.739174 18110657
north_america <- joined_all %>% filter(country %in% c("United States", "Canada"))
head(na.omit(north_america))
## country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
## 1 Canada CAN 1990 23.74844 0.1461597 21.82110 2.024766
## 2 Canada CAN 1991 23.34036 0.1347912 21.40547 2.046623
## 3 Canada CAN 1992 23.00947 0.1247982 21.06392 2.069720
## 4 Canada CAN 1993 23.03293 0.1191081 21.03444 2.135114
## 5 Canada CAN 1994 22.60288 0.1107671 20.59547 2.152504
## 6 Canada CAN 1995 22.32566 0.1015955 20.28851 2.193303
## Count
## 1 27691138
## 2 28037420
## 3 28371264
## 4 28684764
## 5 29000663
## 6 29302311
south_america <- joined_all %>% filter(country %in% c("Brazil", "Chile"))
head(na.omit(south_america))
## country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
## 1 Brazil BRA 1990 74.96820 44.08928 28.36460 3.330584
## 2 Brazil BRA 1991 71.52505 41.12989 27.91653 3.272506
## 3 Brazil BRA 1992 69.97594 39.07269 28.37737 3.321153
## 4 Brazil BRA 1993 69.34644 37.34668 29.37063 3.439490
## 5 Brazil BRA 1994 66.74580 34.60871 29.48986 3.445359
## 6 Brazil BRA 1995 63.54859 31.67095 29.22721 3.430127
## Count
## 1 149003223
## 2 151648011
## 3 154259380
## 4 156849078
## 5 159432716
## 6 162019896
africa <- joined_all %>% filter(country %in% c("Nigeria", "Malawi"))
head(na.omit(africa))
## country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
## 1 Malawi MWI 1990 167.7156 153.3657 12.60813 3.518561
## 2 Malawi MWI 1991 167.8769 153.3428 12.77371 3.541273
## 3 Malawi MWI 1992 171.1963 156.2008 13.19234 3.618770
## 4 Malawi MWI 1993 175.2565 159.9608 13.45895 3.686304
## 5 Malawi MWI 1994 180.9753 164.9773 14.10506 3.784780
## 6 Malawi MWI 1995 183.4036 166.9812 14.48956 3.847709
## Count
## 1 9404500
## 2 9600355
## 3 9685973
## 4 9710331
## 5 9745690
## 6 9844415
europe <- joined_all %>% filter(country %in% c("Germany", "Serbia"))
head(na.omit(europe))
## country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
## 1 Germany DEU 1990 41.91322 1.600590 38.11494 2.724651
## 2 Germany DEU 1991 40.73815 1.472532 37.08854 2.694316
## 3 Germany DEU 1992 38.94425 1.367432 35.45345 2.622836
## 4 Germany DEU 1993 38.25349 1.275528 34.85003 2.623219
## 5 Germany DEU 1994 36.85860 1.182584 33.58411 2.573705
## 6 Germany DEU 1995 35.66449 1.109101 32.47285 2.557293
## Count
## 1 79433029
## 2 80013896
## 3 80624598
## 4 81156363
## 5 81438348
## 6 81678051
asia <- joined_all %>% filter(country %in% c("Pakistan", "Sri Lanka"))
head(na.omit(asia))
## country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
## 1 Pakistan PAK 1990 144.7155 104.4196 34.80304 10.09603
## 2 Pakistan PAK 1991 148.0120 105.5436 36.80428 10.35961
## 3 Pakistan PAK 1992 148.6560 105.2133 37.76577 10.35540
## 4 Pakistan PAK 1993 149.6526 104.9854 38.95704 10.37194
## 5 Pakistan PAK 1994 151.1992 105.3557 40.06784 10.44016
## 6 Pakistan PAK 1995 154.9523 107.2959 41.72728 10.67907
## Count
## 1 107647921
## 2 110778648
## 3 113911126
## 4 117086685
## 5 120362762
## 6 123776839
oceania <- joined_all %>% filter(country %in% c("Australia", "New Zealand"))
head(na.omit(oceania))
## country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
## 1 Australia AUS 1990 26.70503 0.6924006 25.72983 0.3285590
## 2 Australia AUS 1991 25.91503 0.6172074 25.02097 0.3222915
## 3 Australia AUS 1992 25.70745 0.5594191 24.86599 0.3286297
## 4 Australia AUS 1993 24.63559 0.4920491 23.86602 0.3232958
## 5 Australia AUS 1994 24.38185 0.4454673 23.65269 0.3300999
## 6 Australia AUS 1995 23.10038 0.3895721 22.43122 0.3244735
## Count
## 1 17065100
## 2 17284000
## 3 17495000
## 4 17667000
## 5 17855000
## 6 18072000
This is a closer view on the population growth over time in both the high and low populated countries that we selected.
Which country has the highest average death count?
Let’s make a table depicting the high and low populated countries and their respected death count due to pollution.
|
|
Let’s see how this is different from continent to continent
#Mean total deaths for each continent
deaths_north <- na.omit(north_america) %>%
group_by(country) %>%
summarize(north_america_deaths = mean(total_deaths))
deaths_south <- na.omit(south_america) %>%
group_by(country) %>%
summarize(south_america_deaths = mean(total_deaths))
deaths_africa <- na.omit(africa) %>%
group_by(country) %>%
summarize(africa_deaths = mean(total_deaths))
deaths_europe <- na.omit(europe) %>%
group_by(country) %>%
summarize(europe_deaths = mean(total_deaths))
deaths_asia <- na.omit(asia) %>%
group_by(country) %>%
summarize(asia_deaths = mean(total_deaths))
deaths_oceania <- na.omit(oceania) %>%
group_by(country) %>%
summarize(oceania_deaths = mean(total_deaths))
#Table to view continent deaths
kable(deaths_north, caption = "North America Average Death Count")
| country | north_america_deaths |
|---|---|
| Canada | 18.18542 |
| United States | 26.35827 |
kable(deaths_south, caption = "South America Average Death Count")
| country | south_america_deaths |
|---|---|
| Brazil | 48.42928 |
| Chile | 36.51321 |
kable(deaths_africa, caption = "Africa Average Death Count")
| country | africa_deaths |
|---|---|
| Malawi | 147.7717 |
| Nigeria | 112.3016 |
kable(deaths_asia, caption = "Asia Average Death Count")
| country | asia_deaths |
|---|---|
| Pakistan | 144.33463 |
| Sri Lanka | 69.60383 |
kable(deaths_europe, caption = "Europe Average Death Count")
| country | europe_deaths |
|---|---|
| Germany | 28.10988 |
| Serbia | 80.66558 |
kable(deaths_oceania, caption = "Oceania Average Death Count")
| country | oceania_deaths |
|---|---|
| Australia | 17.76815 |
| New Zealand | 15.92536 |
Here’s a graph to clearly visualize the previous table
So we’ve looked at the deaths due to pollution, but what percentage of the population was affected?
In order to get rid of the leading zeros, and clean up the y-axis, we multiplied the ‘percent_high’ and ‘percent_low’ by 100,000 since the data was per 100,000 when calculating deaths.
|
|
Which type of pollution has the greatest number of deaths?
| country | avg_indoor | avg_outdoor | avg_ozone |
|---|---|---|---|
| Pakistan | 87.7427944 | 50.52063 | 10.440656 |
| Nigeria | 75.8755074 | 35.21678 | 2.117076 |
| Brazil | 19.4258385 | 26.84194 | 2.740342 |
| Germany | 0.7170881 | 25.47078 | 2.343892 |
| Australia | 0.2485867 | 17.20789 | 0.360452 |
| United States | 0.1656402 | 22.79947 | 3.915093 |
| country | avg_indoor | avg_outdoor | avg_ozone |
|---|---|---|---|
| Canada | 0.0651156 | 16.38423 | 1.9697041 |
| Chile | 8.6932699 | 27.17442 | 0.8504919 |
| Malawi | 132.1891749 | 13.81151 | 3.3870514 |
| New Zealand | 0.2908622 | 15.56872 | 0.0727512 |
| Serbia | 35.8762796 | 42.71254 | 2.9395671 |
| Sri Lanka | 44.5428441 | 24.77233 | 0.4304406 |
Let’s look at the previous two decades and compare the death count
This is the first decade 1996-2006has there been a change?
|
|
|
|
Let’s graph the previous tables!
The first decade 1996-2006.
This shows the second decade 2007-2017.
By comparing each pollutant type, we can determine which year and country had the highest numbers of deaths
Indoor Deaths
Outdoor Deaths
Ozone Deaths
outdoor or indoor pollution?
Let’s reintroduce a graph we looked at earlier. Instead this time we will combine the pollutant types together.
We cannot conclude which is worse.
[https://www.kaggle.com/datasets/akshat0giri/death-due-to-air-pollution-19902017 ]
[https://www.epa.gov/ground-level-ozone-pollution/ground-level-ozone-basics]
[https://www.health.nsw.gov.au/environment/air/Pages/outdoor-air-pollution.aspx]
[https://www.kaggle.com/datasets/imdevskp/world-population-19602018]